[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand#16693
[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand#16693cloud-fan wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #71933 has finished for PR 16693 at commit
|
| def start(): StreamingQuery = { | ||
| if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) { | ||
| throw new AnalysisException("Hive data source can only be used with tables, you can not " + | ||
| "read files of Hive data source directly.") |
There was a problem hiding this comment.
This is not to read but write the results to Hive tables, right?
| def load(): DataFrame = { | ||
| if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) { | ||
| throw new AnalysisException("Hive data source can only be used with tables, you can not " + | ||
| "write files of Hive data source directly.") |
There was a problem hiding this comment.
This is to read the streaming data from Hive tables, right? I think we need to fix the error message.
| return Seq.empty | ||
| } | ||
| sparkSession.sessionState.executePlan(InsertIntoTable( | ||
| metastoreRelation, Map(), query, overwrite = false, ifNotExists = false)).toRdd |
There was a problem hiding this comment.
uh... Previously, we try to create the table even if the table still exists. A good change!
| val withSchema = if (withFormat.schema.isEmpty) { | ||
| tableDesc.copy(schema = query.schema) | ||
| } else { | ||
| withFormat |
There was a problem hiding this comment.
To the other reviewers, this is not needed, because the schema is always empty when we need to create a table. See the assert here..
| tableDesc.storage.outputFormat | ||
| .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, Text]].getName)), | ||
| serde = tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)), | ||
| compressed = tableDesc.storage.compressed) |
There was a problem hiding this comment.
Actually, after the code refactoring, this is always ensured in the rule DetermineHiveSerde.
|
LGTM except two minor comments in the error messages. |
|
@gatorsmile thanks for adding comments about why the cleanup is safe! |
|
Test build #72022 has finished for PR 16693 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
ok to test |
|
Test build #72109 has finished for PR 16693 at commit
|
|
Thanks! Merging to master. |
## What changes were proposed in this pull request? After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16693 from cloud-fan/minor.
What changes were proposed in this pull request?
After #16552 ,
CreateHiveTableAsSelectCommandbecomes very similar toCreateDataSourceTableAsSelectCommand, and we can further simplify it by only creating table in the table-not-exist branch.This PR also adds hive provider checking in DataStream reader/writer, which is missed in #16552
How was this patch tested?
N/A